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Summary 

The nucleotide sequence of the 3'-end of the genomic RNA of human coronavirus 
OC43 (HCV-OC43) was determined from the cDNA clones of the intracellular 
virus-specific mRNAs. The nucleotide sequence and the predicted amino acid 
sequence of the main open reading frame (ORF), which represents the nucleocapsid 
(N) protein, were highly homologous to those of bovine coronavirus (BCV) Mebus 
strain. This ORF predicts a protein of 448 amino acids. Additional smaller ORFs 
are also present in a different reading frame. We have also determined the leader 
sequence present at the 5'-end of HCV-OC43 mRNAs by a primer extension study. 
This sequence is highly homologous to that of mouse hepatitis virus, particularly in 
the 3'-end of the leader sequence, which is postulated to be involved in the unique 
mechanism of leader-primed transcription. These data suggest that HCV-OC43 and 
BCV might have diverged from each other fairly recently and that the 3'-end of the 
leader sequence has significant functional roles. 


Human coronavirus; Nucleocapsid sequence; Leader RNA 


Human coronaviruses (HCV) are important human respiratory pathogens, re¬ 
sponsible for almost 20% of common colds in winter seasons (McIntosh, 1974). 
They have also been associated with diarrhea of unknown etiology, such as infantile 
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necrotizing enterocolitis (Resta et al., 1985). Furthermore, coronaviruses have been 
isolated from autopsied brain of multiple sclerosis patients (Burks et al., 1980), 
although the identities of these viruses have been questioned (Weiss, 1983) and the 
work has not been repeated. HCV can be divided into two antigenically distinct 
groups: one is represented by HCV-OC43, which shares antigenicity with 
coronaviruses of some other species, such as bovine coronavirus (BCV) and mouse 
hepatitis virus (MHV) (Macnaughton et al., 1981). The other group is represented 
by HCV-229E, which shares antigenicity with porcine transmissible gastroenteritis 
virus (TGEV) and canine coronavirus (CCV) (Pedersen et al., 1978). The close 
homology between HCV-OC43 and BCV, in particular, has been demonstrated by 
oligonucleotide fingerprinting analysis of the genomic RNA of these two viruses, 
which showed remarkable similarity (Lapps and Brian, 1985), and also by im- 
munoprecipitation of individual viral proteins with specific antisera (Hogue et al., 
1984). These studies suggested that HCV-OC43 and BCV may have diverged fairly 
recently. 

Coronaviruses, as a group, contain a single-stranded, positive-sense RNA genome 
of 6 X 10 6 -8 X 10 6 daltons (Lai and Stohlman, 1978; Lomniczi and Kennedy, 1977). 
In infected cells, the viral RNA genome is first transcribed into a full-length 
negative-strand RNA, which, in turn, is transcribed into a positive-sense genomic 
RNA and six subgenomic mRNAs (Lai et al., 1981). These mRNAs are arranged in 
a 3'-coterminal nested set structure, in which the sequence of every mRNA is 
contained within the sequence of the next larger mRNA (Lai et al., 1981). Further¬ 
more, each mRNA contains an approximately 70-nucleotide leader RNA which is 
derived from the 5'-end of the genomic RNA (Lai et al., 1982, 1983, 1984; Spaan et 
al., 1983). The derivation of the leader RNA takes place by a mechanism of 
“leader-primed transcription”, in which the leader RNA is transcribed indepen¬ 
dently, dissociated from the template and then binds to the template at the 
downstream transcriptional start sites (Baric et al., 1983; Makino et al., 1986). This 
mechanism appears to involve recognition of a stretch of repeat sequences present at 
both the 3'-end of the leader sequence and the transcriptional start sites of the 
template RNA (Shieh et al., 1987; Makino et al., 1988). 

Relatively little of the molecular biology of human coronaviruses has been 
studied. It has been shown that the size and species of virion RNA and intracellular 
viral RNAs of at least one human coronavirus, 229E, are comparable to those of 
other coronaviruses (MacNaughton, 1978; Weiss, 1983). Also, both HCV-OC43 and 
HCV-229E consist of at least three structural proteins: a nucleocapsid (N) protein 
of 52 kDa, an envelope peplomer (E2) protein of 190 kDa and a matrix (El) protein 
of approximately 26 kDa (Schmidt and Kenny, 1982; Hogue et al., 1984). Several 
additional minor proteins, such as a glycoprotein of 140 kDa, have also been 
reported (Hogue et al., 1984; Hierholzer et al., 1972). 

In this report, we studied the sequence of both the 3'- and 5'-ends of the genomic 
RNA of HCV-OC43. We determined the sequence of the nucleocapsid gene, which 
shows the presence of a very strong conservation between HCV-OC43 and BCV. We 
also found that the leader sequence of HCV-OC43, particularly at its 3'-end, is 
highly homologous to that of MHV. 
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The HCV-OC43 strain (McIntosh et al., 1967) was originally obtained from Dr. 
Marion Cooney of the University of Washington, Seattle, and was propagated on a 
human rectal tumor (HRT) cell line (Tompkins et al., 1974). The virus harvested 
from the medium of infected cell culture was purified according to published 
procedures (Makino et al., 1984) and viral RNA was extracted as described 
(Kamahora et al., 1979). The purified viral genomic RNA was initially used for 
cDNA cloning using oligo (dT) as a primer for reverse transcription. Subsequent 
cDNA cloning followed essentially the procedures of Gubler and Hoffman (1983), 
as modified by Shieh et al. (1987). Several cDNA clones specific for HCV-OC43 
were obtained; one of these, T16, has a 0.6 kilobase (kb) insert and hybridized to all 
of the virus-specific subgenomic mRNA species in Northern blot analysis (data not 
shown). This result suggests that T16 represents a cDNA clone of the 3'-end of the 
genomic RNA since the subgenomic mRNAs of coronaviruses have a 3'-end 
co-terminal nested-set structure (Stem and Kennedy, 1980; Lai et al., 1981); thus, 
only the cDNA probe representing the 3'-end of the genomic RNA could hybridize 
to all of the subgenomic mRNA species. 

Additional cDNA clones were obtained using poly(A)-containing intracellular 
RNA from HCV-OC43-infected HRT cells as template and oligo(dT) as primer. 
The virus-specific clones were identified with a nick-translated T16 as the probe. 
The positive clones were further tested by Northern blot analysis of the intracellular 
RNA. All of the cDNA clones selected hybridized specifically to HCV-OC43-specific 
mRNAs, similar to the mRNA patterns detected with T16 as the probe (data not 
shown). Thus, these cDNA clones represent sequences of the 3'-end of HCV-OC43 
RNA, overlapping at least part of the 3'-most gene, which encodes the N protein. 

Several representative cDNA clones were sequenced. The sequencing strategy is 
shown in Fig. 1. Sequencing was carried out by Sanger’s dideoxyribonucleotide 
chain termination method (Sanger et al., 1977). In some cases, deoxyinosine 
triphosphate (Bankier and Barrell, 1983) was used in place of deoxyguanosine 
triphosphate to reduce GC compression. Clone A3 has a poly A tract of 19 
nucleotides at its 3'-end and represents a nearly full-length cDNA clone of mRNA 7 
(about 1700 bases). Clone M6 is 2.4 kb in length, the 3'-terminal 1.6 kb of which 
overlaps with the A3 clone. However, the 5'-end 0.8 kb is distinct from the 
corresponding region of A3. Thus, the clone M6 probably represents a cDNA clone 
of mRNA6. This interpretation is consistent with the presence of the consensus 
intergenic sequence in this clone, at the position upstream of the open reading frame 
(ORF) of mRNA 7 (see below). 

The complete sequence of the 3'-terminal 1.7 kb of the HCV-OC43 RNA, 
covering the entire nucleocapsid gene, was obtained from both A3 and M6 clones. 
This sequence is shown in Fig. 2. Both A3 and M6 clones are identical starting from 
nucleotide 69 in reference to mRNA 7 sequence (see below), until the 3'-end. The 
entire sequence is translated in three possible reading frames (Fig. 3). The largest 
ORF can code for a protein of 448 amino acids. This predicted protein is highly 
homologous to the corresponding ORF of BCV Mebus strain (Lapps et al., 1987) 
(Fig. 2). These two proteins have identical predicted numbers of amino acids. There 
are 44 nucleotide differences and 11 amino acid substitutions between the two 
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Fig. 1. Restriction endonuclease map and sequencing strategy for HCV-OC43 cDNA clones A3, T16 and 
M6. Restriction enzyme sites are: Ps, Pst I; Hh, Hha I; Pv, Pvu 11; Ss, Sst I; R, Rsa I; Ha, Hae III; M, 
Mbo II. The arrows with the mark ( *) indicate the sequences determined by using both dGTP and dITP. 
The Pst I sites at the ends of each fragment are the cloning sites on the vector. The corresponding mRNA 
structures are included for comparison. L represents the leader sequence. 


strains. This protein is most likely the nucleocapsid protein of the virus. There are 
two additional smaller ORFs on the first reading frame. These two ORFs could 
potentially encode proteins of 60 and 108 amino acids, respectively. This structure is 
different from that of BCV RNA since the corresponding region of BCV RNA has a 
single larger continuous ORF capable of encoding a protein of 207 amino acids 
(Lapps et al., 1987). The functional significance of these ORFs is not clear. 

The 5'-ends of the clones A3 and M6 are divergent as shown by the two 
branched sequences in Fig. 2. At the juncture of divergence, both clones contain a 
stretch of sequence, TCTAAAT (nucleotides 69-75 of A3), which is very similar to 
the postulated consensus leader RNA binding site of MHV (Shieh et al., 1987). In 
the case of MHV, this stretch of sequence is present at both the 3'-end of the leader 
RNA sequence and the intergenic regions of the genomic RNA and subgenomic 
RNAs (Shieh et al., 1987). Thus, the 5'-ends of these two clones may represent the 
sequences of leader RNA and part of mRNA 6-coding sequence, respectively. To 
determine the leader sequence of HCV-OC43 mRNA, we performed a primer-exten¬ 
sion study using a 32 P-5'-end-labeled oligodeoxyribonucleotide (5'-CATCCT- 
TAAAATTTA-3') complementary to nucleotides 71 to 85 (the underlined region in 
Fig. 2) as the primer and mRNA 7 as the template. The cDNA product extended 
with reverse transcriptase was sequenced by the Maxam and Gilbert method (1980). 
The sequence determined by this method is identical to the 5'-end sequence of the 
clone A3. Similar primer extension studies have also been performed using the same 
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intergenic region * * ** * * ACACCGCATTGTTGAGAAATAAT 

1 5•TTGTGAGCGAAGTTGCGTGCGTGCVrTCCCGCTTCACCTGATCTCTTGTTAGATCTTTTTCTAATCTAATCTAAATTTTAAGG. B 2 


8 3 ATGTCTTTTACTCCTGGTAAGCAATCCAGTAGTAGAGCGTCCTCTGGAAATCGTTCTGGTAATCGCATCCTCAAGTGGGCCGATCAGTCC 172 

1 MSFTPGKQSSSRASSGNRSGNGILKWADQS 30 


173 GACCAGGTTAGAAATGTTCAAACCAGGGGTAGAAGAGCTCAACCCAAGCAAACTGCTACCTCTCAGCAACCATCAGGAGGGAATGTTGTA 2 62 

31 DQVRNVQTRGRRAQPKQTATSQQPSGGNVV 60 

g L BCV 

263 CCCTACTATTCTTGGTTCTCTGGAATTACTCAGTTTCAAAAGGGAAAGGAGrTTGAGTTTGTAGAAGGACAAGGTGTGCCTATTGCACCA 352 

61 PYYSWFSGITQFQKGKEFEFVEGQGPPIAP 90 

A BCV 


353 GGAGTCCCAGCTACTGAAGCTAAGGGGTACTGGTACAGACACAACAGAGGTTCTTTTAAAACAGCCGATGGCAACCAGCGTCAACTGCTG 442 
91 GVPATEAKGYWYRHNRGSFKTADGNQRQLL 120 

BCV 

443 CCACGATGGTATTTTTACTATCTGGGAACAGGACCGCATGCTAAAGACCAGTACGGCACCGATATTGACGGAGTCTACTGGGTCGCTAGT 532 
121 PRWYFYYLGTGPHAKDQYGTDIDGVYWVAS 150 

F BCV 

533 AACCAGGCTGATGTCAATACCCCGGCTGACATTGTCGATCGGGACCCAAGTAGCGATGAGGCTATTCCGACTAGGTTTCCGCCTGGCACG 622 
151 NQADVNTPADIVDRDPSSDEAIPTRFPPGT 180 

L BCV 

623 GTACTCCCTCAGGGTTACTATATTGAAGGCTCAGGAAGGTCTGCTCCTAATTCCAGATCTACTTCGCGCACATCCAGCAGAGCCTCTAGT 712 
181 VLPQGYYIEGSGRSAPNSRSTSRTSSRASS 210 

A BCV 

713 GCAGGATCGCGTAGTAGAGCCAATTCTGGCAATAGAACCCCTACCTCTGGTGTAACACCTGACATGGCTGATCAAATTGCTAGTCTTGTT 902 
211 AGSRSRANSGNRTPTSGVTPDMADQIASLV 240 

BCV 

903 CTGGCAAAACTTGGCAAGGATGCCACTAAACCTCAGCAAGTAACTAAGCATACTGCCAAAGAAGTCAGACAGAAAATTTTGAATAAGCCC 992 
241 LAKLGKDATKPQQVTKHTAKEVRQKILNKP 270 

Q I BCV 

993 CGCCAGAAGAGGAGCCCCAATAAACAATGCACTGTTCAGCAGTGTTTTGGTAAGAGAGGCCCTAATCAGAATTTTGGTGGTGGAGAAATG 1082 
271 RQKRSPNKQCTVQQCFGKRGPNQNFGGGEM 300 

BCV 

1083 TTAAAACTTGGAACTAGTGACCCACAGTTCCCCATCTTTGCAGAACTCGCACCCACAGCTGGTGCGTTTTTCTTTGGATCAAGATTAGAG 1172 
301 LKLGTSDPQFPILAELAPTAGAFFFGSRLE 330 

BCV 

1173 TTGGCCAAAGTGCAGAATTTATCTGGGAATCCTGACGAGCCCCAGAAGGATGTTTATGAATTGCGCTATAACGGCGCAATTAGGTTTGAC 1262 
331 LAKVQNLSGNPDEPQKDVYELRYNGAIRFD 360 

L BCV 

1263 AGTACACTTTCAGGTTTTGAGACCATAATGAAGGTGlTGAArGAGAATTlGAATGCAIATCAACAACAAGATGGTATGATGAATATGAGr 1352 
361 STLSGFETIMKVLNENLNAYQQQDGMMNMS 390 

BCV 

1353 CCAAAACCACAGCGTCAGCGTGGTCATAAGAATGGACAAGGAGAAAATGATAATATAAGTGTTGCAGTGCCCAAAAGCCGCGTGCAGCAA 1442 
391 PKPQRQRGHKNGQGENDNISVAVPKSRVQQ 420 

A BCV 

1443 AATAAGAGTAGAGAGTTGACTGCAGAGGACATCAGCCTTCTTAAGAAGATGGATGAGCCCTATACTGAAGACACCTCAGAAATATAAGAG 1532 
421 NKSRELTAEDISLLKKMDEPYTEDTSEIZ 448 

BCV 

1533 AATGAACCTTATGTCGGCATCTGGTGGTAACCCCTCGCAGAAAAGTCGAGATAAGGCACTCTCTATCAGAATGGATGTCTTGCTGCTATA 1622 
1623 ATAGATAGAGAAGGTTATAGCAGACTATAGATTAATTAGTTGAAAGTTTTGTGTTGTAATGTATAGTGTTGGAGAAAGTGAAAGACTTGC 1712 
1713 GGAAGTAATTGCCGACAAGTGCCCAAGGGAAGAGCCAGCATGTTAAGTTACCACCCAGTAATTAGTAAATGAATGAAGTTAATTATGGCC 1802 
1803 AATTGGAAGAATCAC-POLY (A) -3' 1817 

Fig. 2. The primary nucleotide sequence of clone A3 and the deduced amino acid sequences for the 
largest ORF. The region underlined (nucleotides 71 through 85) were used for preparing a complemen¬ 
tary oligodeoxyribonucleotide to serve as a primer for primer-extension studies (Lai et al., 1984) and 
subsequent determination of the leader sequence. The sequence preceded by an asterisk and above the 
nucleotides 46 through 68 was obtained from clone M6 and represents the partial sequences of mRNA 6 
and intergenic region between mRNAs 6 and 7. The individual amino acid codes beneath the translated 
continuous amino acids correspond to amino acid differences in the BCV Mebus strain (Lapps et al., 
1987), as compared to the HCV-OC43 strain. 
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Fig. 3. Schematic diagram of possible open reading frames obtained when translating the A3 nucleotide 
sequence as virus-sense RNA. Vertical bars above the line represent the methionine codon that could 
serve as the initiation site for translation. Vertical bars below the line represent termination codons. N, 
sequence of the nucleocapsid protein gene. Orf-1, Orf-2, possible short open reading frames. 


synthetic primer and mRNA 6 as the template. The sequence determined from the 
primer-extended cDNA product was identical to that determined from clone M6 
(the sequence preceded by asterisks) (Fig. 2). Thus, the juncture where the sequences 
of M6 and A3 clones diverge represents the leader-mRNA fusion site. 

The sequence analysis presented in this communication confirmed previous 
reports of close relationship between HCV-OC43 and BCV as revealed by serologi¬ 
cal analysis (Pedersen et al., 1978; Gema et al., 1981), immunoprecipitation of 
virion structural proteins (Hogue et al., 1984) and oligonucleotide fingerprinting of 
genomic RNA (Lapps and Brian, 1985). Among the 448 amino acids predicted for 
the N protein, only 11 amino acids differ between the two viruses, which represents 
97.5% homology. This result suggests that these two viruses have only recently 
diverged from each other. Although the remaining sequence of their genomic RNAs 
have not been compared, they are likely to be very homologous as well since their 
genomic RNA fingerprints are extremely similar (Lapps and Brian, 1985). Interest¬ 
ingly, these two viruses have different target cell specificity in vitro and infect 
different animals. Furthermore, HCV-OC43 causes mainly respiratory illness while 
BCV mainly affects the gastrointestinal system. Thus, these two viruses may provide 
a useful tool for understanding the molecular basis of these biological properties of 
coronaviruses. 

An additional difference between the sequences of the 3'-end of the genomic 
RNA of the two viruses is that BCV has a second ORF on a different reading frame 
from that of the N gene. This ORF has the potential to code for a protein of 207 
amino acids (Lapps et al., 1987). In contrast, the corresponding ORF in HCV-OC43 
RNA is interrupted by a termination codon, leaving two smaller ORFs, potentially 
capable of coding for proteins of 60 and 108 amino acids, respectively. It is not clear 
whether these ORFs are functional; it is interesting to note that these ORFs have an 
optimum translation initiation signal, according to M. Kozak’s consensus sequences 
(Kozak, 1984, 1986), in both BCV and HCV-OC43 RNA. It will be interesting to 
determine whether such proteins are synthesized in HCV-OC43 or BCV-infected 
cells. 
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Fig. 4. Comparison of leader sequences between HCV-OC43 and MHV-JHM (Shieh et al., 1987), and 
between HCV-OC43 and infectious bronchitis virus (IBV) (Brown et al., 1986). Asterisks represent 
nucleotides missing at the corresponding positions and were used to assist sequence alignment. The 
underlined region denotes UCUAA repeats. 


Our data also show that the leader sequences of HCV-OC43 and MHV are very 
well conserved (73%) (Fig. 4). The sequence conservation is particularly notable in 
the 3'-half of the leader sequence, in which 26 of 27 nucleotides are perfectly 
matched (96% homology). It has been shown that this region contains the binding 
sites for the leader RNA to template RNA during the transcription of coronavirus 
mRNAs (Shieh et al., 1987; Makino et al., 1988). In contrast, the 5'-end of the 
leader RNA was not as well conserved (60% homology). This conservation pattern is 
consistent with the functional significance of the 3'-end of the leader RNA in 
coronavirus RNA transcription (Shieh et al., 1987). It is notable that only one 
UCUAA sequence (nucleotides 69-73) overlaps between mRNA 6 and mRNA 7, 
while there are three UCUAA repeats in the leader region of mRNA 7 (Fig. 2). The 
UCUAA sequence has been implicated in leader RNA binding (Makino et al., 
1988). The presence of multiple UCUAA repeats in the leader RNA predicts that 
mRNA 7 of HCV-OC43 may be heterogeneous in number of UCUAA repeats, as 
has been shown for MHV mRNAs (Makino et al., 1988). When compared to IBV 
(Brown et al., 1986), which belongs to a separate antigenic group of coronaviruses, 
the sequence conservation is not as remarkable (52%) (Fig. 4). The significance of 
the leader sequence conservation will require further analysis of additional 
coronavirus strains. 

We have previously shown that different strains of MHV could freely exchange 
the leader RNA on their subgenomic mRNAs during mixed infection (Makino et 
al., 1986). Furthermore, different strains of MHV could undergo RNA recombina¬ 
tion at a very high frequency (Makino et al., 1986; Keck et al., 1988). The 
phenomena of leader RNA reassortment and RNA recombination have not been 
demonstrated between coronaviruses of different species. The finding presented in 
this report that the leader RNAs of HCV-OC43 and MHV are highly homologous, 
particularly in the 3'-terminal region where leader RNA binds to template RNA, 
suggests that the leader RNA of HCV-OC43 and MHV might be exchangeable 
during mixed infection. Furthermore, the sequence homology in the nucleocapsid 






gene and possibly other genes as well suggests that these two viruses might be able 
to undergo interspecies RNA recombination between them. Such possibilities are 
currently being investigated in our laboratory. 
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